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Abstract — It is shown that for any binary-input discrete 
memoryless channel W with symmetric capacity I{W) and 
any rate R < I{W), the probability of block decoding error 
for polar coding under successive cancellation decoding satisfies 
Pe < for any P < \ when the block-length A'^ is large 

enough. 

I. Introduction 

Channel polarization is a method, introduced in for 
constructing a class of capacity-achieving codes, called polar 
codes, on binary-input symmetric channels. Polar codes are 
of interest theoretically because they have a well-defined con- 
struction rule (that involves no trial-and-error) and are prov- 
ably capacity-achieving. The aim of this paper is to strengthen 
the results of [T| on the probability of block decoding error for 
polar codes. We begin by giving the notation and the general 
problem set-up. 

Let : A" ^ 3^ be an arbitrary binary-input DMC (B- 
DMC) with input alphabet A" = {0, 1}, output alphabet y, and 
transition probabihties {W^(y|2;) : x £ X,y <E y}. Let I{W) 
denote the symmetric capacity of W defined as the mutual 
information (in bits) between the input and output terminals 
of W when the input is chosen from the uniform distribution 
on X. This parameter takes values in [0, 1] and sets a limit 
on achievable rates across the channel W using codes that 
employ the channel input letters with equal frequency. Let 
Z{W) = J2yey VWiy\OW{y\l). This parameter also takes 
values in [0, 1] and is an upper bound on the probability of ML 
decision error when the channel is used only once to transmit 
either a or a I. We will use Z{W) as a measure of reliability. 

The parameter I(W) is of a more fundamental nature than 
Z(W), however, Z{W) will play a more central role in the 
following analysis since it is more readily tractable. A useful 
pair of inequalities that relate these two parameters are 

I(Wf + Z{Wf < 1, 
IiW) + Z{W) > 1, 

both proved in |[l]. 



(1) 



A. A channel transform 

Let W denote the class of all B-DMCs as defined above. 
Consider a channel transform W i— > {W^ , W^) that maps W 
to W^. Suppose the transform operates on an input channel 



W : X ^ y io generate the channels W : X ^ y^ and 
W+ : X y^ X X with transition probabilities 



W+{yiy2Xi\x2) = ]^W{yi\xi(Bx2)W{y2\x2), 



(2) 



where denotes mod-2 addition. 

Notice that in an actual implementation of this transform 
one needs two independent copies of W to generate and 
In that sense, the transform preserves symmetric capacity. 



I{W~)+I{W+) = 2I{W), 



(3) 



which is a direct consequence of the chain rule of mutual 
information. As for the other parameter, we have 

Z{W+) = Z{Wf 
Z{W) < Z{W-) < 2Z{W) - Z{Wf 

whose proofs can be found in H]. Thus, the overall reliability 
is improved in the sense that 

Z(VK-) + < 2Z(H/), (5) 

with more reliable than W and less reliable than W . 

B. Polarization process 

Let {n, P) be a probability space and suppose that {B„ : 
n = 1, 2, . . .} is a sequence of i.i.d. random variables defined 
on this space with 



P{B, = 0) = P{B, = 1) = ^. 



(6) 



, Bn) 



For n > 1, let JF„ be the cr-algebra generated by (Bi , 
We may take JF — UJ^]^JF„. 

Fix a channel WE W. Define a random sequence of 
channels {W„ e W : n > 0} that starts at Wo = W, and 
at time n > 1 sets 




(7) 



where the channels on the right side are defined by the trans- 
form Wn-i (W^rT-ii ^n-i)- Define two random processes 
{/„ : n — 0,1,...} and {Z^ : n ~ 0,1,...} by setting 
/„ := HWn) and Z„ := Z{Wn). 



Observation 1: 

(i) {{Im^n)} is a bounded martingale on [0, 1] and con- 
verges a.s. to a r.v. I^c- 

(ii) {{Zn, J-n)} is a bounded supermartingale on [0, 1] and 
converges a.s. to a rv. Zoo- 

The martingale and supermartingale properties follow from 
(l3]l, (|5]l, and the convergence properties from general results 
on bounded martingales. It was shown in |[T| and we will show 
in the sequel that the limit random variables loo and Zoa are 
a.s. 0-1 valued. It then follows that loo + Zoo = 1 in view of 
©. Since E[Ioo] = Iq = I{W), we have P(/oo 1) = I{W) 
and P{Ioo = 0) = 1 - I{W). Consequently P{Zoo = 0) = 
I(W) and P{Zoo = !) = !- liW). Thus the sequence of 
channels {W„} polarizes with probability one: they become 
perfect with probability I{W), useless with probability 1 — 
liW). 

C. Polar coding 

Channel polarization was used in [1] to develop a channel 
coding scheme called polar coding. Polar codes are a class of 
block codes with block-lengths constrained to = 2", n > 0. 
These codes can be encoded in complexity O(A^logA^) and 
decoded by a successive cancellation decoder also in complex- 
ity 0{N log N). These complexity bounds hold uniformly for 
all rates R G [0, 1], although for R > I{W), they have no 
practical relevance. 

To state the results precisely, let {N, R) denote the best 
achievable block error probability under successive cancella- 
tion decoding for polar coding with block length N and rate 
R. It was shown in (T\ that for any given channel W G W, 
any n, and any 7 £ [0, 1], there exists a polar code with block- 
length N — 2", whose rate R and probability of block error 
under successive cancellation decoding P^ satisfy 

R > P{Zn < 7) (8) 

Pe < N-f. (9) 

The main result of [IJ in this regard was to show that for any 
R < I{W) the relation (HJ can be satisfied for large N by 
taking the parameter 7 as a function 7(A^, R) = o{N^'i). This 
enabled UJ to conclude from Q that Pe{N,R) = o{N~i) 
for any fixed R < I{W). 

D. Summary of results 

In this paper we improve the results of Q] by proving the 
following 

Theorem 1: Let W be any B-DMC with I{W) > 0. Let 
R < I{W) and /? < i be fixed. Then, for N = 2", n > 0, 
the best achievable block error probability for polar coding 
under successive cancellation decoding at block length N and 
rate R satisfies 

Pe(^,i?) =0(2-^"). (10) 

□ 

Remark 1: The bound (fTOl i depends only on whether R < 
I{W), but otherwise is not sensitive to R. Determining sharper 



asymptotic results on P^ {N, R) that display a more refined 
dependence on R remains a challenging open problem. 

This result will follow from ^ and (|9]l as a corollary to 
the first half of the following 

Theorem 2: Let W be any B-DMC. For any fixed /? < 5, 

liminf P(Z„ < 2"^'') = I{W). (11) 

n — *cxD 

Conversely, if I{W) < 1, then for any fixed /3 > i, 

liminfF(Z„ > 2"^") = 1. (12) 

n — >oo 

□ 

The rest of this paper is devoted to proving Theorem |2] 
The analysis will be carried out using the supermartingale 
{Zn}- Section abstracts out the general properties of this 
supermartingale so as to carry out the analysis is a more 
general framework unencumbered by the details of the orig- 
inal information-theoretic context. Theorem |2] is restated in 
Section HI] in a general setting and proved in the sections that 
follow. In Section |V] we state some open problems. 

II. Problem restatement 

Let the probability space {fl,J-, P), the Bernoulli sequence 
{B„ : n = 1, 2, . . .}, and the a-algebras {Tn} be defined in 
Section II-BI above. We define the following class of random 
processes on {rt,T,P). 

Definition 1: For each zq E (0, 1), define Zz^ as the class 
of random processes {Z„ : n — 0,1, . . .} such that the process 
begins at Zq = zq, Zn is measureable with respect to JF„, and 
follows trajectories satisfying 

Zn+l = Z^ if Bn+1 = 1, (13) 
Zn+1 e [Zn, 2Z„ - Zl] if Bn+l = 0, (14) 

for n > 0. Let Z -.^ Uzae{o.i)2zo- 

The class Z contains the processes {Zn} that were defined 
in Section J] for all non-trivial channels W E W for which 
< Z{W) < 1. The cases zq = and zq = 1 are excluded 
from the definition since these lead to trivial processes which 
only complicate the statement of the results. Notice that the 
definition of Z makes no reference to the information-theoretic 
origin of the problem, making the rest of the discussion fully 
self-contained. 

Observation 2: For any {Zn} G Z, the following hold. 

(i) Zn G (0,1) for all n > 0. 

(ii) {{Zn, J-n)} is a bounded supermartingale. 

(iii) {Zn} converges a.s. and in to a random variable Zoo, 
which is 0-1 valued a.s. 

The first two observations are obvious. That {Zn} converges 
a.s. and in is by general theorems on bounded supermartin- 
gales. Convergence in implies that E[\Zn+i — Zn\] 0. 
But, E[\Zn+i - Zn\] > {l/2)E[Zn - Z^] > 0, which implies 
E[Zn{l - Zn)] ^ and E[Zoo{l ~ Zoo)] = 0. Thus Zoo 
equals or 1 a.s. 

We will prove Theorem |2] by proving the following equiv- 
alent 



Theorem 3: For any {Zn} e Z and /? < i, we have 

liminf P(Z„ < 2-'"") > P(Zoo = 0); (15) 



conversely, for f3 > 



liminf P(Z„ > 2" 



(16) 



The proof of the converse part (fTSI l will be given in the next 
section. The direct part ( fTSl ) will be proved in Section |IV] 

III. Proof of the converse part 
Fix a process {Zn} G Z. Fix /3 > 1/2 and put Sn{f3) :— 

Let {Zi} be defined as the random process 



Zq — Zq, Zi^i 



Zf ifP,+i-l 
Z, if B,+i = 



i > 0. 



A comparison with ( l20l i shows that {Z^} is dominated by 
{Zi} e Z and thus. 



(17) 



Notice that 



Z„ = 4 ^ (18) 
with L = ^i- Thus, 

P{Zn > Sn) = P{L + l0g2 Iog2(l/Zo) < (19) 

As /3 > i and Zo > 0, by the law of large numbers, this 
probability goes to 1 as n increases, yielding ( fT6] l. 

IV. Proof of the direct part 

Definition 2: Given a process {Zn} G Z and a sequence of 
reals {/„} C [0, 1] convergent to 0, we will say that {/„} is 
asymptotically dominating (a.d.) for {Zn} and write Z„ -< fn 
to mean that 

liminfP(Z„</„)>P(Zoo=0). 

n — >oo 

We will say that {/„} is universally dominating (u.d.) for {Zn} 
if, for any fixed fc > 0, the sequence {fn+k} is a.d. for {Zn}- 
In this notation, the direct part of Theorem |3] claims that, 
for /3 < i, the sequence 2^^ " is a.d. for every process in 
Z. We will prove this claim in several steps. First, we define 
in Section IIV-AI a subclass of processes in Z called extremal 
processes. Next, we show in Section IIV-BI that a sequence 
{/„} is a.d. for the class Z if it is u.d. for the subclass of 
extremal processes. In Section HV-CI we show that {p"} with 
p € (|, 1) is a.d. for every extremal process. In Section HV-DI 
we use this result to show that, for any fixed < i, the 
sequence {2"^"''} is u.d. for extremal processes. 



A. Extremal processes 

Definition 3: A process {Zn} G Z is called extremal if 

(z^„ 



Zn+l 



if Bn+l — 1, 
2Zn — Z\ if P„+i — 0. 



(20) 



The extremal process in Z^^ will be denoted by the notation 
{zi'")} when we need to refer to it explicitly. 

Note that the recursion for an extremal process can be 
written alternatively as 



Z„+l — Zn if — 1 



(21) 



(1 - Zn+l) = (1 - Znf if Bn+l = 0. (22) 

and also as 



Zn+l = Zn + XnZn{l - Zn), n>0 



(23) 



where Xn = (1 — 2P„) is a ±l-valued random process. These 
forms emphasize the symmetric nature of the extremal process. 

We state some properties of extremal processes that follow 
immediately from Observation |2] 

Observation 3: For {Zn} any extremal process, in addition 
to Observation m we have 

(i) {Zn} is a Markov process. 

(ii) {Zn} is a bounded martingale. 

(iii) P(Zoo = 0) = 1 - Zo, P(Zoo = 1) = Zo. 

The term extremal is justified by the following 
Observation 4: 

(i) Every process {Zn} G Z^g is dominated by {zi^"^} on 
a sample function basis, i.e., Z„ < Zn°\ 

(ii) The extremal process is dominated by {Zjf ^} 
on a sample function basis for all < a < /3 < 1. 

B. A reduction argument 

Proposition 1: If {/„} is a u.d. sequence over the class of 
extremal processes in Z, then {/„} is a.d. over the class Z. 

Proof: Fix a process {Zn} in Z and a sequence {/„} that 
is u.d. over the class of extremal processes. For any A; > 0, 
n > 0, and 5 G (0, 1), we trivially have 

P{Zk+n < fk+n) > P{Zk+n < fk+n \ Zk < 5) P{Zk < S) . 

(24) 

Combining the observations 

PiZk+n < fk+n \Zk<S) > P(Zlf^ < fk+n) 

and 

liminfP(4'') </„+fc)>(l-<5) 

n — *oo 

with ( |24] |. we see that for any fixed fc > 

liminf P(Z„ < /„) = liminf PiZn+k < fn+k) 
> {l-5)P{Zk<d). 



Since this is true for all k, we obtain 

liminf P(Z„ < /„) > (1 - (5) liminf P(Zfc < S) 

n^ao fc— >cx3 

> (1 - (5)P(liminf Zfc < 6) 

where the second line follows by Fatou's lemma and the third 
by the a.s. convergence of {Z^} to the 0-1 valued Zoo- Letting 
5 0+, we obtain 

liminfF(Z„</„)>P(^oo=0), 

n — 'oc 

which completes the proof. ■ 

C. An asymptotically dominating sequence 

Proposition 2: For any p G (f,l), the sequence {p"} is 
a.d. over the class of extremal processes. 

To prove this statement, let us fix {Z„} as an extremal 
process in Z with Zq = zq for some zo G (0, 1). 

Let Q„ := Z„(l - Z„). Then g„ e (0, i] and 



if R 



n+l 



1 



^\zl{i-zl) 

|(2Z„-Z,2)(1-2Z„ + Z2) ifi?„+i=0 



if B„+i = 1 



(1-Z„)(2-Z„) ifB„+i=0. 



(25) 



Lemma 1 ([2J): EjQl /^] < \ ( |)"^'. 

Proo/- Note that ^ z[\ + z) + ^(1 - z)(2 - z) < 
when z e [0, 1]. So, by (ES, ^iQ^+i | Q„] < (|)'^'. 

Thus i?[gy^] < E[Ql^'] (1)"/^ < i . 
By Markov's inequality, we obtain 

Corollary 1: P{Qn > p") < 5 ij^) for p > 0. 
We now turn this into a bound on Z„ . 
Lemma 2: Let fn{p) 



l-v'l-4p" 



if 1 - 4p" > 0, 
fn{p) ■■= 1 Otherwise. Then, Z„ /„(p) for all p e (|, 1). 

Proof: Fix p S (|, 1) and let /„ = /ri(p)- Note that for 
n large enough so that 1 — 4p" > 0, we have 

{Qn < p"} = < /„} U {Z„ > 1 - /„} (26) 

where the sets on the right side are disjoint. So, for n large 
enough 

PiQn < P") = P{Zn < fn) + P{Zn > 1 - /„) (27) 

which gives 

liminf P(Q„ < p") < liminf < /„) 

n — i-oo n^oo 

+ limsupP(Z„ > l-/„). (28) 

n — >oo 

Since p > |, the left side of the above equation equals 1 by 
Corollary [T] Since /„ is monotonically decreasing, 

limsupP(Z„ > 1 - /„) < limsupP(Z„ > 1 - fk) (29) 

n — >C30 n — *oo 

for any k > 1. But limsup„^oo P{Zn > 1 — fk) — zq. Thus, 
liminfP(Z„ < /„) > 1-zo, (30) 



which means that Z„ -< /„, as claimed. ■ 
The proof of Proposition |2] will be complete if we show 
that for every p G there exists p e (|, 1), such that 

fn{p) < p" for all n large enough. It is easy to see that this 
is true for any | < p < p. 

D. A bootstrapping argument 

We now strengthen the result of the previous subsection and 
complete the proof of the direct part of Theorem [3] 

Proposition 3: For any /3 < i, the sequence {2^^" } is 
u.d. over the class of extremal processes. 

Proof: Fix (3 < ^. First note that, for any fixed A: > 0, 
asymptotically in n, we have 2^^*"^'' = 8(2^^"'^) (using 
standard Landau notation). Hence, it suffices to prove that 
{2^^ } is an a.d. sequence. 

Fix {Zn} as an extremal process. We wish to prove that 
Zn -< 2^^ . Consider a second process {Zi} defined by 
fixing an ri > 1 and an to G {0, . . . ,n} and setting 



Zi — Zi, i — 0, . . . , TO, 
'Zf ifP,+i = l 



Zi+1 — 



2Z, if B,+i = 



i > m. 



A comparison with (|20] | shows that Zi < Zi for alH > 1. 

Fix a„ = y/n, and partition the set {to, . . . ,n ~ 1} into 
k = [n — m)/an consecutive intervals Ji, . . . ,Jk of size a„, 
i.e., Jj — {to + (j — l)an, . . . , TO + jan — 1}. Let Ej be the 
event that J2iejj < ^rif3- Observe that 



P{E,) < 2 



-a„[l-W(/3)] 



(31) 



where = -/3 logjC/?) - (1 - /3) log2(l-/3) is the binary 

entropy function. Thus the event G := r\jE^ has probability 
at least 1 — fc2^""[^^^('^^l. Conditional on G, during every 
interval Jj the value of Z is squared at least a„/3 times and 
doubled at most a„(l — /3) times; hence, we have 



l0g2 Z,n+(j + l)a„ < 2°"'' 

and so 

l0g2 Zn < log2 .^n 

<2("-")'5log2Z„- 



< 2("-'")'5 log2 Z„ 



l0g2 Z.m+ja^ + a„(l - /3) 



-a„(l-/3) 
ari(l 



k 



r 



< 2("-'")^ [log2 Zm + an] for n large enough. 



7/8. Conditional on G 



Lastly, fix to 

{Zm < (i) 

log2 Z,n < -n 

log2Z„ < 2("~")''[-n3/'*log2(8/7) + ni/2] < -2"^o(l) 



n G and for n large enough, we have 
3/4log2(8/7); hence. 



Noting that the probability of G approaches 1, we see by 
Lemma |2] that the probability of G approaches 1 — zq. This 

for any fixed 13 < 1/2. ■ 



establishes that Z„ -< 2 



V. Open problems 



Broadly stated, we have been interested in the asymptotic 
behavior of the cumulative probabilities P(Z„ < z) for 
processes {Zn} derived from a channel polarization problem. 
The ultimate result in this regard would be to determine 
explicitly a function E{n,R) such that, for any R € [0, 1], 

liminf P(Z„ < 2-2''<" '") ^ i?. (32) 

n — ^oo 

Theorem [3] gives only some partial characterization of '^^^'"^■^ 
for large n. 

The information-theoretic problem considered in this paper 
can be generalized in two main directions. First, one may 
consider the transform W i-^ {W~ ,W^) for channels with 
input alphabets X — {l,...,^} for arbitrary q > 2. In 
this generalization, the mod-2 addition operation © may be 
replaced with addition mod-g, or even with an arbitrary group 
operation on X. The process {/„} can be defined as before, 
the conservation law ^ still holds, and {/„} is a bounded 
martingale, which must converge a.s. An initial open problem 
for this case is to prove that channel polarization takes place, 
i.e., that {/„} converges a.s. to the set {0, log2 q}- Conditional 
on the validity of channel polarization, a subsequent goal 
would be to determine the rate of polarization. 

Note that for g > 3, the auxiliary random process {Zn} 
can be defined only after giving a new definition for the 
channel parameter Z{W). A natural definition is Z{W) — 
Y.y ^/W{y\x)W{y\x'). Unfortunately, the rela- 
tions (IHl do not hold for this definition, and the process {Z„} 
does not appear likely to facilitate the analysis for g > 3. 

A second direction for generalization of the problem is 
to consider more general channel transformations that pre- 
serve mutual information. For example, a ternary opera- 
tion W ^ {W ,W" ,W"') may be considered such that 
I{W') + I{W") + I{W"') = 3I{W). The random sequence 
of channels can be defined using a ternary fair coin, 

which ensures that {/„} is a bounded martingale. A major 
open problem in this general setting is to determine necessary 
and sufficient conditions on the channel transform to ensure 
channel polarization. 
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